Shadows: a new means of representing documents
نویسندگان
چکیده
Document production tools are present everywhere, resulting in an exponential growth of increasingly complex, distributed and heterogeneous documents. This hampers document exchange, as well as their annotation and retrieval. While information retrieval mechanisms concentrate on textual features (corpus analysis), annotation approaches either target specific formats or require that a document follows interoperable standards – defined via schemas. This work presents our effort to handle these problems, providing a more flexible solution. Rather than trying to modify or convert the document itself, or to target only textual characteristics, the strategy described in this work is based on an intermediate descriptor – the document shadow. A shadow represents domain-relevant aspects and elements of both structure and content of a given document. Shadows are not restricted to the description of textual features, but also concern other elements, such as multimedia artifacts. Furthermore, shadows can be stored in a database, thereby supporting queries on document structure and content, regardless document formats.
منابع مشابه
A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier
With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, form...
متن کاملShadow-driven Document Representation A summarization-based strategy to represent non-interoperable documents
Document production tools are present everywhere, resulting in an exponential growth of increasingly complex, distributed and heterogeneous documents. This hampers document exchange, as well as their annotation, indexing and retrieval. Existing approaches to these tasks either concentrate on specific formats or require representing document’s content using interoperable standards or schema. Thi...
متن کاملNo fire without smoke : smoke rendering and light interaction for real- time computer graphics
Realism in computer graphics depends upon digitally representing what we see in the world with careful attention to detail, which usually requires a high degree of complexity in modelling the scene. With some computer graphics applications developers have to limit the complexity of the scene to allow the application to run in real-time on modern consumer grade graphics hardware. This trade-off ...
متن کاملComparison of Aerobic Sporadic Bacilli Structure with Electron Microscopy
I. The existence of differences and similarities in the sur- 4. face features not only of different organisms or groups but also within given species has been demonstrated by a variety of techniques. 2. The different reactions of various organisms to the Gram stain might well be taken as one piece of evidence, the use of the electron microscop and associated preparative techniques (includi...
متن کاملRemoving car shadows in video images using entropy and Euclidean distance features
Detecting car motion in video frames is one of the key subjects in computer vision society. In recent years, different approaches have been proposed to address this issue. One of the main challenges of developed image processing systems for car detection is their shadows. Car shadows change the appearance of them in a way that they might seem stitched to other neighboring cars. This study aims ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012